Picture for Cihang Xie

Cihang Xie

University of California, Santa Cruz

Spiral RoPE: Rotate Your Rotary Positional Embeddings in the 2D Plane

Add code
Feb 03, 2026
Viaarxiv icon

Controllable Layered Image Generation for Real-World Editing

Add code
Jan 21, 2026
Viaarxiv icon

OpenVision 3: A Family of Unified Visual Encoder for Both Understanding and Generation

Add code
Jan 21, 2026
Viaarxiv icon

SimpleMem: Efficient Lifelong Memory for LLM Agents

Add code
Jan 05, 2026
Viaarxiv icon

SpatialThinker: Reinforcing 3D Reasoning in Multimodal LLMs via Spatial Rewards

Add code
Nov 10, 2025
Viaarxiv icon

LightBagel: A Light-weighted, Double Fusion Framework for Unified Multimodal Understanding and Generation

Add code
Oct 27, 2025
Figure 1 for LightBagel: A Light-weighted, Double Fusion Framework for Unified Multimodal Understanding and Generation
Figure 2 for LightBagel: A Light-weighted, Double Fusion Framework for Unified Multimodal Understanding and Generation
Figure 3 for LightBagel: A Light-weighted, Double Fusion Framework for Unified Multimodal Understanding and Generation
Figure 4 for LightBagel: A Light-weighted, Double Fusion Framework for Unified Multimodal Understanding and Generation
Viaarxiv icon

Alignment Tipping Process: How Self-Evolution Pushes LLM Agents Off the Rails

Add code
Oct 06, 2025
Figure 1 for Alignment Tipping Process: How Self-Evolution Pushes LLM Agents Off the Rails
Figure 2 for Alignment Tipping Process: How Self-Evolution Pushes LLM Agents Off the Rails
Figure 3 for Alignment Tipping Process: How Self-Evolution Pushes LLM Agents Off the Rails
Figure 4 for Alignment Tipping Process: How Self-Evolution Pushes LLM Agents Off the Rails
Viaarxiv icon

GPT-IMAGE-EDIT-1.5M: A Million-Scale, GPT-Generated Image Dataset

Add code
Jul 28, 2025
Figure 1 for GPT-IMAGE-EDIT-1.5M: A Million-Scale, GPT-Generated Image Dataset
Figure 2 for GPT-IMAGE-EDIT-1.5M: A Million-Scale, GPT-Generated Image Dataset
Figure 3 for GPT-IMAGE-EDIT-1.5M: A Million-Scale, GPT-Generated Image Dataset
Figure 4 for GPT-IMAGE-EDIT-1.5M: A Million-Scale, GPT-Generated Image Dataset
Viaarxiv icon

MedFrameQA: A Multi-Image Medical VQA Benchmark for Clinical Reasoning

Add code
May 22, 2025
Viaarxiv icon

OpenVision: A Fully-Open, Cost-Effective Family of Advanced Vision Encoders for Multimodal Learning

Add code
May 07, 2025
Figure 1 for OpenVision: A Fully-Open, Cost-Effective Family of Advanced Vision Encoders for Multimodal Learning
Figure 2 for OpenVision: A Fully-Open, Cost-Effective Family of Advanced Vision Encoders for Multimodal Learning
Figure 3 for OpenVision: A Fully-Open, Cost-Effective Family of Advanced Vision Encoders for Multimodal Learning
Figure 4 for OpenVision: A Fully-Open, Cost-Effective Family of Advanced Vision Encoders for Multimodal Learning
Viaarxiv icon